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We give computable bounds on the rate of convergence of the 
transition probabilities to the stationary distribution for a certain 
class of geometrically ergodic Markov chains. Our results are differ¬ 
ent from earlier estimates of Meyn and Tweedie, and from estimates 
using coupling, although we start from essentially the same assump¬ 
tions of a drift condition toward a “small set.” The estimates show a 
noticeable improvement on existing results if the Markov chain is re¬ 
versible with respect to its stationary distribution, and especially so if 
the chain is also positive. The method of proof uses the first-entrance- 
last-exit decomposition, together with new quantitative versions of a 
result of Kendall from discrete renewal theory. 


1. Introduction. Let { X n : n > 0} be a time homogeneous Markov chain 
on a state space (S', B). Let P(x, A),x S S,Ag B denote the transition proba¬ 
bility and let P denote the corresponding operator on measurable functions 
S —> R. There has been much interest and activity recently in obtaining 
computable bounds for the rate of convergence of the time n transition 
probability P n (x,-) to a (unique) invariant probability measure n. These 
estimates are of importance for simulation techniques such as Markov chain 
Monte Carlo (MCMC). 

Throughout this paper we assume the following conditions are satisfied. 

(Al) Minorization condition. There exist C € B, (5 > 0 and a probability 
measure v on (S,B) such that 

P(x,A) > (3v(A) 


for all x G C and Ae B. 
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(A2) Drift condition. There exist a measurable function V: S —> [l,oo) and 
constants A < 1 and K < oo satisfying 


PV{x) < 


XV(x), 

K, 


if x(£C, 
if x e C. 


(A3) Strong aperiodicity condition. There exists (3 > 0 such that $v(C) > (3. 


The following result converts information about the one-step behavior 
of the Markov chain into information about the long term behavior of the 
chain. 


Theorem 1.1. Assume (Al)-(A3). Then {X n :n > 0} has a unique sta¬ 
tionary probability measure it, say, and fVdir<oo. Moreover, there exists 
p< 1 depending only (and explicitly) on (3, (3, A and K such that whenever 
p < 7 < 1 there exists M < oo depending only (and explicitly) on 7 , (3, (3, A 
and K such that 


( 1 ) 


sup 

I g\<v 


(P n 9)(x) 



< MV(x)^ n 


for all 16 S and n > 0 , where the supremum is taken over all measurable 
g : S' —> R satisfying \g(x)\ < V(x) for all x G S. Formulas for p and M are 
given in Section 2.1. In particular, P n g(x ) and f gdi r are both well defined 
whenever 


\\g\\ v = sup{\g(x)\/V(x): x e S} < 00 . 

The proof of Theorem 1.1 appears in Section 4. If we restrict to functions 
g on the left-hand side of ( 1 ) that satisfy \g(x)\ < 1 , we obtain the total 
variation norm \\P n (x, •) — 7t||tv- So the inequality (1) is a strong version of 
the condition of geometric ergodicity, which says that for each x £ S there 
exists 7 < 1 such that 

7 _n ||P n (a;, •) — 7t||tv — * 0 as n-> 00 . 

This concept was introduced in 1959 by Kendall [5] for countable state 
spaces. Important advances were made by Vere-Jones [22] in the countable 
setting, and by Nummelin and Tweedie [12] and Nummelin and Tuominen 
[11] for general state spaces. The condition in (1) is that of V-uniform ergod¬ 
icity. Information about the theories of geometric ergodicity and V -uniform 
ergodicity is given in Chapters 15 and 16 of [ 8 ]. Results that relate the 
different notions of geometric ergodicity are also given in [13]. 

To date two basic methods have been used to obtain computable conver¬ 
gence rates. One method, introduced by Meyn and Tweedie [9], is based on 
renewal theory. In fact Theorem 1.1 is a restatement of Theorems 2.1-2.3 in 
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[9], except that we give different formulas for p and M. Our results in this 
paper use this method. The renewal theory method is easiest to describe 
when C is an atom, that is, P(x,A ) = v(A) for all x £ C and A £ B. In this 
case, the Markov process {X n :n> 0} has a regeneration, or renewal, time 
whenever X n £ C. Precise estimates are based on the regenerative decompo¬ 
sition, or first-entrance-last-exit decomposition; see the proof of Proposition 
4.2. This method requires information about the regeneration time 

r = inf{n > 0: X n £ C} 

which may be obtained using the drift condition (A2). It also requires in¬ 
formation on the rate of convergence of the renewal sequence u n = P(X n £ 
C\Xq £ C) as n —* oo. It is at this point that the aperiodicity condition (A3) 
is used. More generally, if C is not an atom, then the renewal method may be 
applied to the split chain associated with the minorization condition (Al); 
see Section 4.2 for details of this construction. 

The other main method, introduced by Rosenthal [18], is based on cou¬ 
pling theory , and relies on estimates of the coupling time T = inf{n > 0: X n = 
A^} for some bivariate process {(X n ,X' n ): n > 0} where each component is a 
copy of the original Markov chain. The minorization condition (Al) implies 
that the bivariate process can be constructed so that 

P(X n+1 = X' n+ i\(X n ,X’ n ) £ C x C) > p. 

Therefore, coupling can be achieved with probability j3 whenever (X n ,X' n ) £ 
C x C. It remains to estimate the hitting time inf{n > 0: (X n ,X() £ C x C}. 
If the Markov chain is stochastically monotone and C is a bottom or top 
set, then the univariate drift condition (A2) is sufficient. See the results 
in [7] and [21] for the case when C is an atom, and in [16] for the general 
case. For stochastically monotone chains, the coupling method appears to be 
close to optimal. In the absence of stochastic monotonicity, a drift condition 
for the bivariate process is needed. This can often be achieved using the 
same function V that appears in the (univariate) drift condition, but at the 
cost of enlarging the set C and increasing the effective value of A. Further 
information about these two methods and their relationship to our results 
appears in Section 7. 

Our computations for p and M in Theorem 1.1 are valid for a very large 
class of Markov chains and, consequently, can be very far from sharp in par¬ 
ticular cases. They can be improved dramatically in the setting of reversible 
Markov chains. 

Theorem 1.2. Assume (A1)-(A3) and that the Markov chain is re¬ 
versible (or symmetric) with respect to n, that is, 

[ Pf{x)g(x)ir(dx) = [ f(x)Pg(x)ir(dx) 

JS Js 
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for all f,g£ L 2 (it). Then the assertions of Theorem 1.1 hold with the for¬ 
mulas for p and M in Section 2.2. 

Reversibility is an intrinsic feature of many MCMC algorithms, such as 
the Metropolis-Hastings algorithm and the random scan Gibbs sampler. 

Theorem 1.3. In the setting of Theorem 1.2 assume also that the Markov 
chain is positive in the sense that 

f Pf(x)f(x)Tr(dx) > 0 
Js 

for all f £ L 2 (tt). Then the assertions of Theorem 1.1 hold with the formulas 
for p and M in Section 2.3. 

The proofs of Theorems 1.2 and 1.3 appear in Section 5, and some conse¬ 
quences for the spectral gap of P in L 2 ( n) appear in Section 6 . For reversible 
positive Markov chains, our formulas for p give the same values as the formu¬ 
las given by Lund and Tweedie [7] (atomic case) and Roberts and Tweedie 
[16] (nonatomic case) under the assumption of stochastic monotonicity. The 
random scan Gibbs sampler is reversible and positive (see [ 6 ], Lemma 3). If 
{X n : n > 0} is one component in a two-component deterministic scan Gibbs 
sampler, then it is reversible and positive. Moreover, if a transition kernel P 
is reversible with respect to it, then both the kernel P 2 for the two-skeleton 
chain and also the kernel (/ + P )/2 for the binomial modification of the 
chain (see [20]) are reversible and positive. In particular, any discrete time 
skeleton of a continuous time reversible Markov process is positive. 

In Section 8 we give numerical comparisons between our estimates and 
those obtained using [9] and the coupling method. The four Markov chains 
considered are “benchmark” examples used in earlier papers. Note that The¬ 
orem 1.1 outperforms the estimates given in [9]. For reversible chains, Theo¬ 
rem 1.2 is sometimes comparable with the coupling method, and sometimes 
noticeably better. For chains which are reversible and positive, Theorem 1.3 
outperforms the coupling method. 

In this paper our assumptions (A1)-(A3) all involve just the time 1 transi¬ 
tion probabilities. In principle, our methods extend to a more general setting 
where one or more of the conditions involves m-step transitions for some 
m > 1. However, the calculations are much more cumbersome; we omit the 
details. Note that our method typically allows smaller C than does the 
coupling method (see Section 7.2) and so there is less need to pass to mi- 
norization conditions involving time m> 1 (see the example in Section 8.4). 

For the remainder of this introduction, we focus our attention on the 
formula for p. Define py to be the infimum of all 7 for which an inequality 
of the form (1) holds true. Thus py is the spectral radius of the operator P — 
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1 <8>7r acting on the Banach space (By, || • ||y), say, of measurable functions 
g: S —■> R such that ||</||y < oo. We look for inequalities py < p, where p is 
computable from the time 1 transition kernel. 

At the heart of our calculations is an estimate on the rate of convergence 
of P u (X n G C) to 7r(C) as n—> oo. More precisely, define 

pc = limsup \P v (X n G C) - Tr(C)! 1 /-. 

n—>oo 

It is easy to verify [by taking g(x) = tc(x) in (1), integrating with respect 
to v and using §Vdv< oo] that pc < pv ■ In the case that C is an atom, we 
show (as a consequence of Propositions 4.1 and 4.2) that 

(2) py < max(A,pc). 

Suppose instead that C is not an atom, so that (3 < 1 in assumption (Al). We 
consider the associated split chain (see Section 4.2) and apply the atomic 
techniques to the split chain. In this case we show (as a consequence of 
Propositions 4.3 and 4.4) that 

(3) py < max(A, (1 — /3) 1/ai , pc)> 

where a\ = 1 + (log -y-^|)/(log A -1 ). We remark that max(A, (1 — /3) 1//ai ) = 

/3j^, where /3 rt is the estimate obtained by Roberts and Tweedie ([15], 
Theorem 2.3) for the radius of convergence of the generating function of 
the regeneration time for the split chain. Therefore, (3) may be rewritten 
p v < ma x(/3rt,Pc)- 

It remains to get a good upper bound on pc■ We do this using renewal the¬ 
ory. Suppose first that C is an atom and consider the renewal sequence uq = 1 
and u n = P(AT n G C\X$ G C) = P^(A n _i G C) for n > 1. The V -uniform er- 
godicity implies that tt ( C ) — lim ^—>og 1 ^ (X-n —i G C'} — hm n —»oo u n — ^005 Say. 
Thus Pq 1 is the radius of convergence of the series J2^=i( u n ~ u oa )z n . The re¬ 
newal sequence u n , n > 0 , is related to its corresponding increment sequence 
b n = P a (r = n), n > 1, by the renewal equation 

u{z) = l/(l-b(z)) 

for \z\ < 1, where u(z) = an< ^ z ) = Z)^=i b n z n . The drift condi¬ 

tion (A2) implies that 

OO 

^ b n \~ n = E a (A” r ) < A ~ l K 

n =1 

(see Proposition 4.1) and the aperiodicity condition (A3) implies that b\ = 
P(a,C) = i'(C) > (3. In these circumstances a result of Kendall [5] shows 
that pc < I- In Section 3 we sharpen Kendall’s result, using the lower bound 
on b\ and the upper bound on Y^=i^n^~ n to get an upper bound on pc , 



6 


P. H. BAXENDALE 


depending only on A, K and f3, which is strictly less than 1. In fact we give 
three different upper bounds on pc- The first formula (in Theorem 3.2) is 
valid with no further restrictions on the Markov chain. The second formula 
(in Theorem 3.3) is valid for reversible Markov chains and the third formula 
(in Corollary 3.1) is valid for Markov chains which are reversible and positive. 

The idea in the nonatomic case is similar. For the split chain the renewal 
sequence is given by u n = [3¥ v (X n _i £ C) for n > 1, so that u n —> u 0 0 has 
geometric convergence rate given by pc- For the corresponding increment 
sequence b n , the estimate on J2^Li^ > ni r ' n is more complicated, see (26) and 
(22), but the way in which results from Section 3 are applied is exactly the 
same. 


2. Formulas for p and M. Here we complete the statement of Theorems 
1.1, 1.2 and 1.3 by giving formulas for the constants p and M. We say that 
the set C is an atom if P(x, •) = P(y, •) for all x,y £ C. In this case we 
assume that (3= 1 and v = P(x, •) for any x £ C. If C is not an atom, so 
that $ <1, we define 

ai = i+ (log / ( lo s A_1 ) 

and 

02 = 1+ (log /(log A -1 ). 

In the special case when v{C) = 1, we can take a .2 = 1. More generally, 
if we have the extra information that v(C) + fg\c^dv < K , we can take 
«2 = 1 + (logif)/(logA -1 ). Then define 

J R 0 = min(A- 1 ,(l-/3)- 1 / Q1 ) 


and, for 1 < R < Rq, define 


L(R) 


(3R a2 

1 - (1 - (3)R a1 ' 


2.1. Formulas for Theorem 1.1. For (3 > 0, R > 1 and L > 1, define i?i = 
Ri(/3, R, L) to be the unique solution r G (1,1?) of the equation 


(r — 1) e 2 /3(R — 1) 

r(\ogR/r) 2 8(L — 1) 


Since the left-hand side of (4) increases monotonically from 0 to oo as r 
increases from 1 to R, the value R± is well defined and is easy to compute 
numerically. For 1 < r < R\ , define 


Ki(r,P, R, L) 


2/3 + 2(logIV)(logl?/r) 1 — 8IVe 2 (r —l)r 1 (logl?/r) 2 
(r — 1 )[/3 — 8Ne~ 2 (r — l)r -1 (log R / r)~ 2 ] 







CONVERGENCE RATES FOR MARKOV CHAINS 


7 


where N = (L — 1)/(R — 1). 

Atomic case. We have p = 1/R\((3, A -1 , \~ l K) and, for p< 7 < 1, 


( 5 ) 


7-A 


7(7 - A) 


(if — A/ 7 ) max(A, iv — A) A(A — 1) 


(7 — A)(l — A)' 


(7 — A)(l — A) 

Nonatomic case. Let 


R = argmaxi?i(/3, R, L(R)). 
1 <-R<i ?0 

Then p= 1 /Ri(/3, R, L(R)), and for p < 7 < 1, 

j j _ max(A, K — A/ 7 ) | iL[iL 7 - A-^( 7 -A)] 


7 - A 


7 2 (7-A)[l- (1-^)7 ai ] 


+- 


Pa 


— OL2~ 2 


iL(A ' 7 - A) 


( 7 - A)[l - (1 -/?)7 ai ] 2 


iL 1 ( 7 - 1 ,AA,L(A)) 


+ 


( 6 ) 


7- Q 2-i(A"7_ A) 

(7 A) [1 (1 /3)7 -Ql ] 2 

/9max(A,iL — A) (1 — / 9 )( 7 _ “ 1 — 1) 


+ 


+ 


1 - A 

j~ a2 X(K — 1 ) 


7 1 — 1 


(l-A)( 7 -A)[l-(l-/3) 7 -“i] 

+ [K ~ A ri (1 ~ A)] ( (7-2 _ i) + (ir ^ _qi - 


(l-A)(l-7) 


p 


Notice that the result remains true with R replaced by any R e (1, R$), but 
it does not give such a small p. We do not claim that R gives the smallest 
K\. 

2.2. Formulas for Theorem 1.2. Here we assume that the Markov chain 
is reversible. 


Atomic case. Define 

_ f supjr < A -1 :1 + 2 f3r > r 1+ ( log ^/( logA ')}, 
' A" 1 , 


R2 = 


if K> A + 2/3, 
if I<< A + 2/3. 
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Then p = R 2 1 and, for p < 7 < 1, replace K\ (7 1 ,/3, A 1 , A 1 K) by K 2 = 
1 + 1/(7 — p) in (5) for M in Section 2.1. We remark that, using the con¬ 
vexity of r 1+ ( log ^/ ( ' log ' > ' ), we can replace p by the larger, but more easily 
computable, p given by 

r 1 - 2/3(1 - A )/{K - A), if K > A + 2/3, 

1 \ A, if K < A + 2/3. 


Nonatomic case. Define 

i ? 2 = ( su P{ r < -Ro : 1 + 2/3r > L(r)}, 

I Ro, 


if L{Rq) > 1 + 2/3i?o> 
if T(3?o) // 1 A 2/3i?o- 


Then p = R 2 l and, for p < 7 < 1, replace A'i (7 1 ,/3,R,L(R)) by K 2 
V^d /(7 — p) in ( 6 ) for M given in Section 2.1. 


1 + 


2.3. Formulas for Theorem 1.3. Here we assume that the Markov chain 
is reversible and positive. 


Atomic case. We have p = A and M is calculated as in Section 2.2. 

Nonatomic case. We have p = Rq 1 and M is calculated as in Section 2.2. 

3. Kendall’s theorem. The setting for this section is discrete renewal the¬ 
ory. Suppose that Vi, V 2 , ■ ■ ■ are independent identically distributed random 
variables taking values in the set of positive integers and let b n = P(Vi = n) 
for n > 1. Define To = 0 and T^ = V 1 + • • • + 14 for k > 1. Let u n = P (there 
exists k > 0 such that T/ c = n) for n > 0. Thus u n is the (undelayed) renewal 
sequence that corresponds to the increment sequence b n . The following result 
is due to Kendall [5] . 

Theorem 3.1. Assume that the sequence {b n } is aperiodic and that 
XXi b n R n < 00 for some R > 1. Then u 00 = lim n _ >00 u n exists and the series 
Y^^=o( u n ~ Uoo)z n has radius of convergence greater than 1 . 

In this section we obtain three different lower bounds on the radius of 
convergence of Jf(u n — u 00 )z n . 


3.1. General case. 


Theorem 3.2. Suppose that b n R n < L and b\ > /3 for some con¬ 
stants R > 1, L < 00 and /3 > 0. Let N = (L — L)/(R — 1) > 1. Let R\ = 
Ri(/3, R, L) be the unique solution r £ (1,3?) of the equation 

(r — 1) _ e 2 /3 

r(logi?/r) 2 8 N 



CONVERGENCE RATES FOR MARKOV CHAINS 


9 


Then the series 

oo 

'y ' [Un Uoa)z 
n= 1 


has radius of convergence at least R\. For any r G (l,i?i), define K\ = 
Kfir^^R.L) by 


Then 


1 / (3 + 2 (log iV) (log R/r) 1 \ 

r-1 \ P — 8Ne~ 2 (i —l)r -1 (logi?/r) -2 / 


( 7 ) 


OO 

^2i u n 


n =0 



<K i 


for all \z\ < r. 


Proof. Define the sequence c n = for n > 0 and define gener¬ 
ating functions b{z) = b n z n , c(z) = Jf^=o c nZ n and u(z) = Y^=o u nZ n 

for \z\ < 1. The renewal equation gives 


( 8 ) 


1 -b(z) _ 1 _ 1 

l-z (1 - z)u(z) 1 -J 2 ^=l(u n - 1 -U n )z n 


for \z\ < 1. Since the power series for c(z ) has nonnegative coefficients, for 
\z\ < R we have 


c(z) | < c(R) 


b{R) - 1 
R-1 


< 


L- 1 


R-1 


= N 


so that c(z) is holomorphic on \z\ < R. Now 


5fi((l — z)c(z)) = K(1 — b(z)) 

OO 

= £ b n M( 1 - z n ) 

n =1 


for \z\ < 1. It follows that 




1 — re 


■ 

sm - 

V2 ) 


for all r < 1. In particular, since c(r) > 0 for all r > 0, we see that c(z) 0 
whenever \z\ < 1. For 1 <r < R, 


c(re ld )| > (3\ sin(0/2)| — | c(re ld ) — c(e ld )\ 
> sin(0/2)| - (c(r) — c(l)). 
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Moreover, for 1 < r < R, 

| c(re %e )\ > c(r) — \re 10 — r\ sup{|c , (z)| :z G [r, re 1,6 }} 
> c(r) — \re ie — r\c' (r) 

= c(r) — 2r\sin(6 / 2)\d (r). 

Combining these two estimates we obtain 


\c(re l0 )\ > 


f3 — A(r) 


13/c{r ) + B(r) ’ 

where A(r) = 2 rc'(r)[c(r) — c(l)]/c(r) and B{r) = 2rc / (r)/c(r). Since the 
power series for c has nonnegative coefficients, we may apply Holder’s in¬ 
equality to obtain 


c(s) < c(r) 


r 


(log c(R)/c(r))/(log R/r) 


) 


for 0 < r < s < R. Letting s \ r gives 

c(r ) logc(i?)/c(r) 


c'(r) < 


and, consequently, 


c(r) — c(l) < 


r log R/r 
(r — l)c(r) logc(i?)/c(r) 


r log R/r 

for 1 < r < R. Thus we obtain the estimates 


., . 2(r — 1) 

A(r) <- -c{r ) 


log 


and 


B(r) < 2 


Using the inequality 


x 


log 


N 


N 


c(r) 


N 

c(r) 


log 


12 r Ri -2 


R 


log 


I -l 


log — 

X 


< 41Ve 


-2 


for 0 < x < N in A(r) and the inequality c(r) > 1 in B(r ) we get 


A(r) < 


8Ne~ 2 (r — 1) 


log 


R 


1 -2 


and 


B{r ) < 2 log N 


log 


R 


I -l 
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Thus for 1 < r < R± we have 


( 9 ) 


, ies | > P~8Ne 2 {r — l)r 1 (logR/r) 2 
— /3 + 2 (log N)(logR/r)~ 1 


>0. 


Therefore c(z) / 0 for all \z\ < R\. Recalling (8), we see that J2^Li( u n-i ~ 
u n )z n is holomorphic on \z\ < R\ and, therefore, r n \u n -\ — u n \ —» 0 as n — * oo 
for each r < Ri. It follows directly that = Imin^^Un exists and r n \u n — 
Uoo\ —> 0 as 77. — > oo for all r < R\. Furthermore, using the fact u n — u 00 = 
Em=n+l(«rn-l “ U m ), We get 


OO 1 / °o 

^ ] (^n 'Uoo)Z = ~ - I ^ ' (u m —i Ujn'jz (1 U 0 o) 

n,=0 ^ \m=l 


whenever 1 < \z\ < R\. Therefore, using (8) again, for 1 < r < R\ we have 


sup 

|z|<r 


'y \ (u n Uoo^z 

n=0 


= sup 

\z\= r 


u 00 ')z 

n =0 


< 


1 


1-1 


1 + sup 


1 


\z\=r lei 2 )! 


and now (7) follows from (9). □ 


The estimates in Theorem 3.2 apply to a very general class of renewal 
sequences and as a result they are very far from the best possible in certain 
more restricted settings. We see in Theorem 3.3 and Corollary 3.1 that the 
estimates can be dramatically improved when we have extra information 
about the origin of the renewal sequence. Meanwhile, the following discussion 
shows that the estimate on the radius of convergence in Theorem 3.2 can be 
of the correct order of magnitude. 

Suppose that f3 and L are fixed. Then as R \ 1 we have 

(10) ~ fjjZVTj^ - T' 

The effect of the (R — l) 3 term is that, typically, R\ is very much closer 
to 1 than R is. This is a major contributing factor to the disappointing 
estimates obtained using Theorem 1.1 in the examples in Sections 8.1 and 
8.2. However, in the absence of any further information beyond that given 
by the constants /?, R and L , the following calculations show that the term 
(R — l) 3 in (10) is optimal. 

Consider the family of examples b(z) = (3z + (1 — (3)z k for fixed (3 and 
k —» oo. For each k there is a solution Zk of the equation (3z + (1 — /3)z k = 1 
near e 27r */ fc . Calculating the asymptotic expansion for Zke~ 2m ^ k in powers of 
1/A: we obtain 


Zk 


f 2? r/3i \ 2 ( 2it 2 (3 2tt/3 2 z 

U — PJ ~V(1 — (3) 2 (1 ~ P) 2 


k~ 3 + 0{k~ 4 ) 
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and thus 


\Zk\ — 1 + 


2n 2 /3 


k~ 3 + 0{k ~ 4 ) 


For fixed /3 and L this example satisfies the conditions of Theorem 3.2 as long 
as PR + (1 — P)R k = L. As k —> oo we have R — 1 ~ log 7? ~ (1/fc) log(^^) 
and thus 


\zk\ ~ 1 


27 x 2 p 

0^ 


log 


L-P 

T^pj\ 


-3 


(R-iy 


It is clear from the proof of Theorem 3.2 that any r satisfying (7) must 
satisfy r < \zk\- Thus the factor (R — l) 3 in (10) is optimal, although clearly 
the factor e 2 P/8(L — 1) is not. 


3.2. Reversible case. In this section we assume that the renewal sequence 
u n is generated by a Markov chain {X n : n > 0} which is reversible with 
respect to its invariant probability measure 7r. Thus 

7 T (dx)P(x, dy) = ir(dy)P(y, dx) 

in the sense that the measures on S x S given by the left-hand and right-hand 
sides agree. 


Theorem 3.3. Let {X n :n> 0} be a Markov chain which is reversible 
with respect to a probability measure it and satisfies P(x,dy) > pic(x)v(dy) 
for some set C and probability measure v. Let {u n :n > 0} be the renewal 
sequence given by uq = 1 and u n = fiP u (X n _i G C) for n > 1. and suppose 
that the corresponding increment sequence {b n : n > 1} satisfies b n R n < 
L and b\> P for some constants R> 1, L < oo and P > 0. If L > 1 + 2 PR, 
define R 2 = R 2 (P, R, L) to be the unique solution r £ (1,7?) of the equation 

1 + 2/3r = r ( 1 °g Z/ )/( 1 °g ii ) 


and let R 2 = R otherwise. Then the series 

OO 

^ ) ( U n Uqq^Z 
n =0 


has radius of convergence at least R 2 . Moreover, if 


( 11 ) 


lim 

n —>00 


1C 


P n ±c{x)n{dx) - (vr(C)) 2 


r n < 00 for all r < R 2 , 


then, for 1 <r < R 2 , we have 


OO 

y ) | u n Uoo\r C 

n= 1 


Vfir 

1 - r/i? 2 ' 


(12) 
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Proof. Notice first that the discussion of split chains in Section 4.2 im¬ 
plies that {u n : n > 0} is indeed a renewal sequence. The reversibility implies 
that the transition operator P for the original chain {X n : n > 0} acts as 
a self-adjoint contraction on the Hilbert space L 2 (ir). We use (•,■) for the 
inner product in L 2 ( n) and || • || for the corresponding norm. For any 4c5 
we have 

ir(A) = J P(x, A)n(dx) > ^v(A)tt(C), 

so that v is absolutely continuous with respect to ir and has Radon-Nikodym 
derivative du/dn < 1 /(/?vr(C)). Throughout this proof we write f = ± c and 
g = du/dn. Then f,g€ L 2 (tt) with ||/|| 2 = vr(C) and ||g|| 2 < 1/(/3tt(C)). Now 
for \z\ < 1, 

OO 

(1 - z)u(z) = (1 - z) + 0(1 - z) £ ( pn_1 /, g)z n 

n— 1 

= (l-z)+Pz(l-z)((I-zP)~ 1 f,g). 

Since P is a self-adjoint contraction on L 2 (ir), its spectrum is a subset of 
[—1,1] and we have a spectral resolution 


P = 


XdE(X) 


(see, e.g., [23], Section XI.6), where E( 1) = / and \\vn\f-\E{X) =0. Write 
F( A) = {E(X)f,g). The function F is of bounded variation and the corre¬ 
sponding signed measure say, is supported on [—1,1] and has total mass 
IM/JG-MD < ll/ll ' h\\<p- 1/2 - We obtain for \z\ < 1, 


(1 — z)u(z) = (1 — z) + (3z( 1 


z) f (1 - z\) 1 gf,g(dX ) 

; 1 ] 


and so the function (1 — z)u(z) has a holomorphic extension at least to 
{z € C : 2 V 1 ^ [—1,1]} = C \ ((—oo, —1] U [1, oo)). The renewal equation gives 


(1 -z)u(z) 


l-z 


for \z\ < 1, and the function b is holomorphic in B(0,R). It follows that the 
only solutions in B{ 0, R) of the equation b(z) = 1 lie on one or the other of the 
intervals (-R, —1] and [1, R). Since 6'(1) > 0, the zero of b(z) — 1 at z = 1 is 
a simple zero. For 1 < r < R we have b(r) > b( 1) = 1. For 1 < r < R we also 
have b(—r ) < —2b\r + b(r). Using the estimate b(r) < [6(R)]( logr )/( logil>) = 
r (i°gL)/(iogR)^ f 0 n 0ws that for 1 < r < R 2 we have b(—r ) < 1, where R 2 is 
given in the statement of the theorem. Thus (1 — z)u(z) has a holomorphic 
extension to R( 0 ,i? 2 ) and the first statement of the theorem follows as in 
the proof of Theorem 3.2. 
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Now we assume (11). Given r < R2 we have 
(13) \(P n f,f)-(ir(C)) 2 \<Mr- n 

for some M (depending on r). Recalling the spectral resolution, we have 

(P n fJ)= I X n d(E(X)fJ). 

•'[-hi] 

Letting n —> 00 we get 

lim (P n f,f)= [ d(E(X)f,f) 

n—>oc J 

and so (13) may be rewritten as 


(14) 


[ X n d(E(X)f, f) 

■'[- 1 , 1 ) 


< Mr~ n . 


Now A —> (E(X)f,f) is an increasing function and hence corresponds to a 
positive measure [if, say, on [—1,1], Letting n —* 00 in (14) through the even 
integers, we see that ///([— 1, —1/r)) = /iy((l/r, 1)) =0. This is true for all 
r < i ?2 and so (E(X)f,f) is constant on [— 1,— I/R2) and on (I/R2.I). It 
follows that F(X) = ( E(X)f,g ) is constant on these same intervals and so 
the support of |///, 5 | is contained in [— I/R2, I/R2] U{1}. Noting that 


u °° = ^ n 1 ™ c ( pn 'M-PJ&fr 1 Vf,g{ dX )=PVf,g({ 1 }) 
we get, for n > 1 , 


| Hra Hog | — ft 

<P 


A"-V /)fl (dA) 


[—1/^2, l/-^2] 

J \ 71—1 






Im/, 9 I 

71—1 


-1 1 


So for r < R2 , we get 


^ ) (lira Hooll” 5~ 


V^r 


77=1 


1 - r/R 2 


as required. □ 


Remark 3.1. The estimate (12) is true without the extra assumption 
(11) if P is a compact operator on L 2 (tt). The first assertion in Theorem 3.3 
implies that 
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for all r < and the compactness implies that the restriction of fif t9 to 
[—1,1] \ [—l/i? 2 , 1 /-R 2 ] is a finite sum of atoms. It then follows directly that 
the support of \^f, g \ is contained in [— I/R 2 , 1 /-R 2 ] U {1}. 

Corollary 3.1. In the setting of Theorem 3.3, assume also that 

j Pf(x)f(x)ir(dx) > 0 for all f E L 2 (tt). 

Then in the assertions of Theorem 3.3 we can take R 2 = R. 

Proof. The additional assumption implies that the spectrum of P is 
contained in [0,1]. Arguing as in the proof of Theorem 3.3, we obtain, for 
\z\ < 1, 

(1 - z)u(z) = (1 - z) + f3z( 1 - z) f (1 - zXy 1 fjif t g(d\) 

A 0,1] 

and so the function (1 — z)u(z) has a holomorphic extension at least to 
{z E C : z~ 1 [0,1]} = C \ [1, 00). It follows that the equation b(z) = 1 cannot 

have a solution in (— R, — 1] and so (1 — z)u(z) is holomorphic on B(0,R). 
The remainder of the proof goes as in Theorem 3.3. □ 

The following lemma enables us to apply Corollary 3.1 and Theorem 1.3 
to a large class of Metropolis-Hastings chains, including the example in 
Section 8.2. 


Lemma 3.1. The Metropolis-Hastings chain generated by a candidate 
transition density q(x, y ) of the form 

q{x,y) = j r(z,x)r(z,y)dz 

is reversible and positive. 


Proof. Since a Metropolis-Hastings chain is automatically reversible, 
it suffices to check positivity. For notational convenience, we identify the 
measure n with its density n(x) with respect to the reference measure dx. 
Notice first that for any g E L 2 (tt) we have 


(15) 


JJ g(x)g(y)mm(ir(x),ir(y))dx dy 

= IJ g(x)g(y) %*(*)](*)% w (y)](*)) dt ^j dxd v 

j J g(x)l[ 0 ^ x )]{t)g(y)t[ 0 ^ y) ]{t)dxdyj dt 
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= (Jg( X )t [ 0 Mx)] (t)dx) dt 

> 0 . 

The assumption on q implies that q(x,y ) = q(y,x), and so the kernel P for 
the Metropolis-Hastings chain is given by 

Pf(x) = J f{y) min(7r(y)/7r(x), 1 )q(x, y) dy + a(x)f(x) 

for some a(x) > 0. Then, for / S L 2 (tt), we have 

J Pf( X )f(x)n(x) dx = J j f(x)f{y)m.m(Tr(x),-ir(y))q(x,y)dx dy 

+ J a(x)f(x) 2 iT(x) dx. 

Clearly the second term on the right-hand side is nonnegative, and the first 
term on the right-hand side is 

r(z, x)r(z , y) dz^j dx dy 

f (x)r(z, x)f(y)r(z, y) min(7r(x), vr(y)) dx dyj dz 

> 0 , 

where we use (15) with g(x) = f(x)r(z,x) and then integrate with respect to 

□ 

Remark 3.2. The condition on q is satisfied if r is a symmetric Markov 
kernel and q corresponds to two steps of r. 

4. Proof of Theorem 1.1. In this section we describe the methods used 
to obtain the formulas in Section 2.1 for p and M. From the results of Meyn 
and Tweedie [8, 9] we know that {X n :n > 0} is V -uniformly ergodic, with 
invariant probability measure it, say. We concentrate on the calculation of p 
and M. We do not make any assumption of reversibility in this section. At 
the appropriate point in the argument we appeal to Theorem 3.2. Proofs of 
Propositions 4.1-4.4 appear in the Appendix. 

4.1. Atomic case. Suppose that C is an atom for the Markov chain. 
Then in the minorization condition (Al) we can take /3 = 1 and v = P{a , •) 
for some fixed point a€ C. Let r be the stopping time 

r = inf{n > 1: X n € C} 



JJ f(x)f(y) min(vr(x), 7 r(y)) 
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and define u n = P a (X n E C) for n > 0. Then u n is the renewal sequence 
that corresponds to the increment sequence b n = P a (t = n) for n > 1. Define 
functions G(r, x) and H (r, x) by 

G(r, x) = E x (r T ), 

H(r,x) = E x (^£^r n V(X n )^j 

for all x £ S and all r > 0 for which the right-hand sides are defined. Most of 
the following result is well known (see, e.g., [7], Lemma 2.2 and Theorem 3.1). 
The estimate in (iv) appears to be new, and helps to reduce our estimate 
for M. 

Proposition 4.1. Assume only the drift condition (A2). 

(i) For all x £ S', P x (r < oo) = 1. 

(ii) For 1 < r < A -1 , 


G(r,x) < 

(iii) For 0 < r < 


fV(x) 

\rK, 


r XV (x) 

H ^)<\ r(k r Xx) 

1 — rA 

(iv) For 1 < r < A^ 1 and iGC, 

H(r,x ) — rH(l,x) 


if x<fC, 
ifxEC. 


if x fiC, 
if x G G. 

A r(K - 1) 


r — 1 (1 — A)(l — rA) 

The following result is a minor variation of results in [8]. 

Proposition 4.2. Assume only that the Markov chain is geometrically 
ergodic with (unique) invariant probability measure n, that C is an atom 
and that V is a nonnegative function. Suppose g: S —> R satisfies ||g||v < 1. 
Then 


sup 

|2?|<r 


n =1 


J2[P n g(x) - gdir 


< Fd (r, x) + G(r, x)F[ (r, a) sup 

\z\<r 


n =0 


U n 


+ H(r a) G ( r,X ' > ~ 1 + H ( r »°)~ rff ( 1 ’ fl ) 
’ r — 1 r — 1 

for all r > 1 for which the right-hand side is finite. 












18 


P. H. BAXENDALE 


It is an immediate consequence of Propositions 4.1 and 4.2 that py < 
max(A,/?c) when C is an atom. 


Proof of estimates for the atomic case. We apply Theorem 3.2 
to the sequence u n . For the increment sequence b n = P a {r = n) we have 
£~ =1 b n X~ n = E a (A" r ) = G(A _1 , a) < A ~ l K. Moreover the aperiodicity con¬ 
dition (A3) gives b\ = P(a, C ) > /3. For 1 < r < Ri(/3, A -1 , A -1 A') and K\ = 
Ki(r,/3, \ _1 , X~ 1 K), Theorem 3.2 gives 


sup 

|^|<r 


^ ^ n ^oo)^ 


n=0 


<Ky 


By substituting this and the estimates from Proposition 4.1 into Proposition 
4.2 together with the inequality 

G(r, x) — 1 G(A _1 , x) — 1 ^ max(A, K — A), 


we get 


and so 


where 


(16) 


r — 1 


sup 

\z\<r 


A" 1 -! 


< 


1 — A 


-V(x), 


n =1 


J2[P n 9( x ) - g dir 


< MV(x) 


P n g{x ) - j gdir 


< MV (x)r~ n , 


M = 


r max(A, K — rX) r 2 K(K-rX) ± ^ 

-TTa - + (1-rA) ' A K) 

^ r(K — rX ) max(A, K — A) A r(K — 1) 


(1 — rA)(l — A) (1 — rA)(l — A) ’ 

Therefore, we can take p= l/R\(/3, A -1 , X~ 1 K) and the formula for M is 
obtained by putting r = l/y in (16). □ 


4.2. Nonatomic case. If C is not an atom, then in the minorization con¬ 
dition (Al) we must have j3 < 1. Following Nummelin ([10], Section 4.4), we 
consider the split chain {(X n ,Y n ) :n > 0} with state space S x {0,1} and 
transition probabilities given by 

P{Y n = 1| T* V Tl_\\ = pic(X n ), 

(v(A), if Y n = 1, 

P{Xn+i € A\p* v Tl} = \ P(X n% A)~pt c {X n )v{A) if Y = 

{ i-hc{x n ) 
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Here = cr{X r : 0 < r < n) and = &{Y r : 0 < r < n}. Thus the split 
chain evolves as follows. Given X n , choose Y n so that P {Y n = 1) = (3±c{X n ). 
If Y n = 1 then X n +i has distribution v, whereas if Y n = 0 then X n+ i has dis¬ 
tribution (P{X n , •) - /3lc(X n )i /)/(1- pi c (X n )). The split chain {(X n ,Y n ) :n 
0} is designed so that it has an atom S x {1} and so that its first component 
{X n : n > 0} is a copy of the original Markov chain. 

We apply the ideas of Section 4.1 to the split chain (X n ,Y n ) with atom 
S x {1} and stopping time 

(17) T = min{n > 1: Y n = 1}. 

Let P XJ and E' r ' f denote probability and expectation for the split chain 
started with X$ = x and Yq = i. To emphasize the similarities with the 

• • • • • • 2C 1 

calculations in the previous section, we fix a point a£C, and write P = 
P"' 1 and E' 1 ' 1 = E" 1 . Define the renewal sequence u n = P a,1 (Y) l = 1) for 
n > 0 and the corresponding increment sequence b n = P (T = n) for n > 1. 
Notice that u n = /3P H ' 1 (X n € C) = /3P t/ (X n _i E C) for n > 1, so that pc 
controls the rate of convergence of u n —> Uqo in the nonatomic case also. 
Following the methods used in the atomic case, we define 

G(r,x,i) = E x ’\r T ), 

ff(r,a;,i)=E x ’ i ^Xjr n n^n)^ 

for all x E S, i = 0,1 and all r > 0 for which the right-hand sides are defined. 
If we define 

E x = [l- fa c {x)]E x ' Q + 01 c {x)E x '\ 
then E' £ agrees with E x on T x = cr{X n :n > 0}. Define 

G(r,x) = E x (r T ), 

H(r,x) = E^r-y(X n )j. 

Applying the techniques used in Proposition 4.2 to the split chain, we 
obtain the following result. 

Proposition 4.3. Assume only that the original Markov chain is geo¬ 
metrically ergodic with (unique) invariant probability measure n and that V 
is a nonnegative function. Suppose g:S R satisfies \\g\\v < 1- Then 
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< H ( r , x) + G(r, x)H(r, a, 1) sup 

\z\<r 


J2 ( Un - 


Woo z 


n =0 


+ 77(r,a,l) G(r - l) ~ 1 + g(r -°- 1) ~ rg(1 -°' 1) 
1 - 1 1 - 1 

for all r > 1 for which the right-hand side is finite. 


We need to extend the estimates on G(r,x) and H(r,x ) from Section 4.1 
to estimates on the corresponding functions G(r,x,i ) and H(r,x,i ) defined 
in terms of the split chain and the stopping time T. Define 

G(r) = sup{E x ’ 0 (r r ): x 6 C}. 

Notice that the initial condition {x, 0) for x £ C represents a failed opportu¬ 
nity for the split chain to renew. Thus G(r) represents the extra contribution 
to G(r, x, i) and H(r, x, i ) which occurs every time the split chain has X n e C 
but fails to have Y n = 1. Given X n £ C, this failure occurs with probability 
(1 — (3). Thus to get finite estimates for G(r,x,i ) and H(r,x,i), we insist 
on the condition (1 — /3)G(r) < 1. This idea is formalized in Lemmas A.l 
and A.2 in the Appendix. For our purposes here the important estimates 
are given in the following result. The estimate (19) and an estimate closely 
related to (21) appear in [15], where they denote Rq = /3rt- 


Proposition 4.4. Assume conditions (Al) and (A2) with (3 < 1. Define 


(18) ai = 1 + (tog ^ I (log A 1 ). 

Then, for 1 < r < A -1 , 

(19) G(r) < r ai . 
Furthermore, define 

(20) £*2 = 1+ (log / (log A -1 ) 

and Ro = min(A _1 , (1 — /3) -1 /" 1 ). Then 


( 21 ) 

( 22 ) 

(23) 


G{r,x) < 


f3G(r,x ) 

1 - (1 - /3)r Q1 ’ 


G{r,a, 1) < 


fir 012 

1 - (1 - /3)r Q 1 


L(r), 


H(r, x) < H(r, x) + 


r[JT-rA-^(l-rA)] 

(1 — rA)[l — (1 — (3)r ai ] ’ 
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(24) 


H(r,a, 1) < 


r“ 2+1 (K — rX) 

(1 — rA)[l — (1 — (3)r ai ] 


(25) 


H(r,a, 1) — rH(l,a , 1) 
r — 1 

< r a2+1 X(K — 1) 

“ (1 - A)(l - rA)[l - (1 - (3)r ai ] 

r[K - A - /3(1 - A)] /r“ 2 — 1 (1 - /3)(r ai - 1)\ 

+ (l-A)[l-(l-^)r fl i]\r-l + /3(r — 1) ) 


whenever 1 <r < Rq. 


Remark 4.1. If v{C) = 1, then G(r,a, 1) = r and so we can take a 2 = 1 
in Proposition 4.4. More generally if we know that v{C) + ,[g\c V dv < K , 
then we can take «2 = 1 + (log^)/(logA _1 ). 


It is an immediate consequence of Propositions 4.3 and 4.4 that 
p v < max(A, pc, (1 - P ) 1/ai ) 
when C is not an atom. 


Proof of estimates for the nonatomic case. We apply Theo¬ 
rem 3.2 to the sequence u n . For the increment sequence b n = P a ’ (T = n) 
we have 

OO 

(26) 5Z ^ nRn = E "’ 1 (R T ) = G{R, a, 1 ) < L(R) 

n— 1 


for 1 < R < Rq, where the constant Rq and the function L(R) are defined in 
Proposition 4.4. The aperiodicity condition (A3) implies b\ = /3P(a, C ) > (3. 
For the moment fix a value of R in the range 1 < R < Rq. By Theorem 3.2, 
for 1 < r < Ri(/3, R, L(R)), we have 


OO 

sup ^2(u n 

M< r n =0 



<Ki{r,P,R,L{R)). 


Notice that (21) implies 


G(r,x ) — 1 


< 


1 


1 - (1 -/?)r Q1 




r — 1 


r — 1 


r — 1 
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Then using the estimates from Propositions 4.1 and 4.4 in Proposition 4.3 
we get, for 1 < r < Ri(f3, R, L(R)), 


sup 

!*!<*■ 


n=l 


J2\ pn g( x ) - gdn 


<MV{x), 


where 


r max(A, K — rX) r 2 K[K — rX — /3(1 — rA)] 
1 — rX (1 — rA)[l — (1 —/3)r ai ] 

{j r a 2+ 2 K ( K _ r \) 


+ 


+ 


Ki(r, f3,R,L(R)) 


(27) 


(1 — rA)[l — (1 — /3)r ai ] 2 
r a2+1 (K~r\) 

(1 — rA)[l — (1 — /3)r ai ] 2 


/3 max(A, K — A) (1 — /3)(r ai — 1) 


+ 


+ 


1 — A 
r Q2 + 1 A(K- 1) 


i— 1 


(1 - A)(l - rA)[l - (1 - P)r a i] 


r[K- A-/9(1-A)] / r“ 2 — 1 (l-^)(r Q1 -l) 


(1 — A)[l — (1 — P)r ai ] V 


r — 1 


+ 


P(r~ 1 ) 


To obtain the smallest possible p, we choose i? G (l,7?o] so as to maximize 
R±(/3, R, L(R)). Then we take p= l/Ri(f3,R,L(R)) and substitute r = 7^ 1 
in formula (27) for M and we are done. □ 

5. Proof of Theorems 1.2 and 1.3 In this section we assume that the 
Markov chain {X n : n > 0} is reversible with respect to its invariant proba¬ 
bility measure it. We first obtain the estimates of Section 2.2. 


Atomic case. The proof in Section 4.1 goes through up to the point where 
we apply Theorem 3.2. Since the Markov chain is reversible, we can replace 
R\ (/3, A -1 , X^ 1 K) of Theorem 3.2 by R 2 = R2(/3, A -1 , A _1 K) of Theorem 3.3. 
Then by the first part of Theorem 3.3, for 1 < r < R 2 we, have 


sup 

|z|<r 


) ( ( Up, 


n =0 


<k 2 


for some K 2 < 00 . At this point we do not have an estimate for K 2 . Contin¬ 
uing as in Section 4.1, we obtain, for 1 < r < f? 2 , 


(28) 


sup 

\z\<r 


n =1 


( pn 9(x) - / gdn)z 


< MV(x) 
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for some constant M. At this point we do not have an estimate for M. 
However, now in (28) we can take g = lc and integrate the x variable with 
respect to 7r over C to obtain the estimate (11). We can now apply the 
second part of Theorem 3.3 to obtain K 2 = 1 + 7-/(1 — r/R 2 ). The rest of the 
proof goes as in Section 4.1. We have p= l/R 2 (/3, A” 1 , \~ l K) and in (16) 
for M we replace K\ by K 2 . 


Nonatomic case. We have the estimate b n R n = G(R,a, 1) < L(R) 
valid for all 1 < R < Rq, and we can choose the R for which we apply 
Theorem 3.3. If 1 + 2(3Rq > L(Rq), then L(Rq) < oo and Y^=ibnRf) = 
G(Ro,a, 1) < L(Rq). We can apply Theorem 3.3 with R = Rq and obtain 
R 2 = Rq . This case can occur only when Ro = 1 < (1 — /?) 1 /" 1 . Otherwise 

we take R. to be the unique solution in the interval (l,i?o) of the equation 
1 + 2/3R = L(R) and apply Theorem 3.3 with R = R to obtain R 2 = R. Then 
by the first part of Theorem 3.3, for 1 < r < R 2 , we have 


OO 

sup ^2 (tin 

n =0 



<K 2 


for some K 2 < oo. Initially we do not have an estimate for K 2 , but the same 
method as above allows us to use the second part of Theorem 3.3 and assert 
that K 2 = 1 + Vj3r/( 1 — r/R 2 ). The rest of the proof goes as in Section 4.2. 
We have p = l/R 2l where R- 2 = supjr < Ro : 1 + 2/3r > L(r)}, and in (27) 
for M we replace K\ by K 2 . 

The estimates of Section 2.3 are obtained in a similar manner, using 
Corollary 3.1 in place of Theorem 3.3. 


6. L 2 -geometric ergodicity for reversible chains. When the Markov chain 
is reversible with respect to the probability measure ir, the Markov operator 
P acts as a self-adjoint operator on L 2 (tt). The equivalence of (pointwise) ge¬ 
ometric ergodicity and the existence of a spectral gap for P acting on L 2 (tt) 
was proved in [13]. Also see [17] for the equivalence of L 2 - and L 1 -geometric 
ergodicity for reversible Markov chains. 


Theorem 6.1. Assume that the Markov chain { X n : n > 0} is V-uniformly 
ergodic with invariant probability -it (so that f V d-K < oo) and let pv be the 
spectral radius of P — 1 ® ir on By. Suppose also that {X n \n > 0} is re¬ 
versible with respect to tt. Then, for all f G L 2 (ir), we have 


P n f~ I fdn 


L 2 


< ( pvY 


f- / fdTT 


L 2 


In particular, the spectral radius of P 


1 (g> 7T on L 2 (tt) is at most py. 
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Proof. For ease of notation write f f dn = f. Suppose first that / is a 
bounded function, so that ||/||y < oo and / \f(x)\V(x)dn(x) < oo. For any 
7 > pv there is M < oo so that 

\P n f(x)-f\<M\\f\\ v V(xh n . 

Multiplying by f(x) and integrating with respect to tt we get 

\{P n fJ)-P\<M\\f\\ v ^j \f(x)\V(x) dir(x)^ 7 n . 

Arguing as in the proof of Theorem 3.3 we see that for any g G L 2 ( tt) the 
function Ai—> ( E(\)f,g ) is constant on [—1 ,—pv) and on (pv, 1). The corre¬ 
sponding signed measure pf t9 has \pf, g \ ([—1,1]) < ||/||l 2 WqWl 2 ■ Therefore 


\(P n f~f,9)\ 



^ ^y II/IIl 2 II^IIl 2 ■ 


This is true for all g G L 2 (tt) so we obtain |j P n f — f ||^2 <^II/IIl 2 . Replacing 
/ by / - / we obtain \\P n f - f \\ L 2 < \\f - f\\ L 2 . Finally for arbitrary 

/ G L 2 (tt) there exist bounded fk so that ||/ — fk\\L 2 ~^ 0- Then, for each 
n > 0 , 


P n f-fh 2 


lim \\P n f k -h\\ L 2 

K —>00 


< Pv lim Wfk- fkh 2 =Pv\\f - fh 2 

K —^OO 


and we are done. □ 


Corollary 6.1. Assume that the Markov chain {X n :n> 0} satisfies 
(A1)-(A3) and is reversible with respect to its invariant probability measure 
tt. Then, for all f G L 2 (n), we have 


P U f- J fdTT ^<p n f- I fdTT 


L 2 


where p is given by the formulas in Section 2.2. If additionally, the Markov 
chain is positive, then the formulas in Section 2.3 may be used. 


7. Relationship to existing results. 


7.1. Method of Meyn and Tweedie. For convenience we restrict this dis¬ 
cussion to the case when C is an atom. The essence of these comments 
extends to the nonatomic case. Since C is an atom, we can assume that 
V(x) = 1 for x G C and so (A2) is equivalent to 

PV(x) < \V(x) + bt c (x), 
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where b = K — A. Also we can take (3 = P(x, C ) for x € C. Meyn and Tweedie 
[9] used an operator theory argument to reduce the problem to estimating 
the left-hand side in Proposition 4.2 at r = 1. If 

sup 
M<1 


OO 

E 

n=l 


pn g(x) - Jgdn 


< M\V(x) 


whenever \\g\\v < 1, then they can take p = 1 — (Mi + 1) _1 . Using the re¬ 
generative decomposition, they obtained Mi < M 2 + (cM 3 , where M 2 and 
M 3 can be calculated efficiently in terms of A and b, and 


C C = sup 
Nl<i 


OO 

1 + ( u n 
n= 1 


U n -l)z n 


sup |(1 - z)u(z) I, 
M<i 


where u(z) is the generating function for the renewal sequence u n = P(X n E 
C \Xq E C ). With no further information about the Markov chain, they ap¬ 
plied a splitting technique to the forward recurrence time chain associated 
with the renewal sequence u n to obtain 


(29) 


32 — 8/3 2 f K — \ \ 2 


We can sharpen the method of Meyn and Tweedie by putting r = 1 in the 
estimate (9) from the proof of Theorem 1.3 to get the new estimate 


(30) 


C c = sup 1(1 -z)u(z) I = 
M=i 


n -1 


inf |c(z)| 
ld=i 


< 1 + 2 log 


L- 1 
R- 1 


/(/31ogR) = l+ (2 log ))/(/9 log A : ). 


With more information about the Markov chain, Meyn and Tweedie ob¬ 
tained better estimates for (c■ However, as they observed in [9], their method 
of using estimates at the value r = 1 to obtain estimates for r > 1 is very far 
from sharp. In particular, it cannot yield the estimate (2). By contrast, we 
use a version of Kendall’s theorem to estimate sup^| <r |(1 — z)u{z)\ and use 
this together with the regenerative decomposition to estimate the left-hand 
side of Proposition 4.2 for r > 1 directly. 


7.2. Coupling method. Our method uses (Al) and (A2) to obtain esti¬ 
mates on the generating function for the regeneration time T for the split 
chain defined in (17). The estimates are based on the fact that the split 
chain regenerates with probability (3 whenever X n E C. The estimate on 
E(r r |Ai ~ v), which is valid for r < Rq, is used with (A3) in Theorem 3.2 
or 3.3 or Corollary 3.1 to obtain pc , and then we take p = min(i2^' 1 , pc)- 
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The estimates on the generating function for T appear also in [15], where 
Ro is denoted /?rt- 

The coupling method, introduced by Rosenthal [18], builds a bivariate 
process {(X n ,X^): n > 0}, where each component is a copy of the original 
Markov chain. The stopping time of interest is the coupling time T = inf{n > 
Q\X n = X' n \. The minorization condition (Al) implies that the bivariate 
process can be constructed so that 

P(X n+1 = X' n+1 \(X n ,X' n ) E C x C) > p. 

Therefore, coupling can be achieved with probability f3 whenever (X n , E 
C x C. To obtain estimates on the distribution of T, a drift condition for the 
bivariate process is needed. If the Markov chain is stochastically monotone 
and C is a bottom or top set, then the univariate drift condition (A2) is 
sufficient. The bivariate process can be constructed so the estimates for the 
(univariate) regeneration time T apply equally to the (bivariate) coupling 
time T. Thus we get p = Rq 1 ■ In particular, if C is an atom, we get p = A. 
See [7] and [21] for the case when C is an atom, and [16] for the general 
case. 

In the absence of stochastic monotonicity, a drift condition for the bivari¬ 
ate process can be constructed using the function V which appears in (A2), 
but at the cost of possibly enlarging the set C and also enlarging the effective 
value of A. Let b = sup xGC PV(x) — XV(x), so that PV(x) < XV(x) + btc(x) 
for all x E S. If h(x, y) = [V(x) + V(y)\/ 2, then 

(P x P)h(x, y) < Ai h(x, y) if (x, y)£C x C, 

where 

b 

^ ^ 1 + min{F(x): x ^ C} ' 

Whereas (A2) asserts A < 1, the coupling method requires the stronger 
condition Ai < 1. This can be achieved by enlarging the set C so as to 
make rriinjl/ (x) :x ^ (7} sufficiently large. Note that the condition PV(x) < 
AV(x) + btc(x) for all x E S remains true with the same values of A and 
b when C is enlarged. However, the value of K = sup xeC PV(x) may have 
increased, and the value of (3 in the minorization condition (Al) may have 
decreased. The coupling method now gives p = Rq 1 , where Ro is calculated 
similarly to Ro except that Ai is used in place of A. 

Here we have followed the “simple account” of the coupling method de¬ 
scribed in [19]. The assertion p = R^ 1 is a direct consequence of [19], The¬ 
orem 1. For various developments and extensions of this method, see also 
[2, 4, 15]. 

Compared with the coupling method, our method has the advantage of 
allowing the use of a smaller set C and a smaller numerical value of A. 
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It has the disadvantage of having to apply a version of Kendall’s theorem 
to calculate pc- In the general setting this is a major disadvantage, but for 
reversible chains it is a minor disadvantage and for positive reversible chains 
it is no disadvantage at all. 

8. Numerical examples. 

8.1. Reflecting random walk. Meyn and Tweedie ([9], Section 8) consid¬ 
ered the Bernoulli random walk on Z + with transition probabilities P(i,i — 
1) = p > 1/2, P(i,i + l)=q=l—p for i > 1 and boundary conditions 
P(0,0) =p, P(0, 1) = q. Taking C = {0} and V(i) = (p/q) 1 ^ 2 , we get A = 
2 y/pq, K = p + y/pq and fl = p. 

For each of the values p = 2/3 and p = 0.9 considered in [9] we calculate p 
in six different ways (see Table 1). Method MT is the original calculation 
in [9], using their formula (29) for fc- Method MTB is the same as MT but 
with our formula (30) in place of (29). Method 1.1 uses Theorem 1.1. So far 
these calculations have used only the values of A, K and j3. The next three 
methods all use some extra information about the Markov chain. Method 
MT* uses [9] with a sharper estimate for fc using the extra information 
that P(t = 1) =p, P{t = 2) =pq and 7r(0) = 1 — q/p. Method 1.2 uses The¬ 
orem 1.2 with the extra information that the Markov chain is reversible. 
Finally Method LT uses the fact that the chain is stochastically monotone 
and gives the optimal result p = A, due to Lund and Tweedie [7]. 

8.2. Metropolis-Hastings algorithm for the normal distribution. Here we 
consider the Markov chain that arises when the Metropolis-Hastings al¬ 
gorithm with candidate transition probability q(x,-) = N(x, 1) is used to 
simulate the standard normal distribution n = N(0,1). This example was 
studied by Meyn and Tweedie [9]. It also appeared in [15] and [14], where 
the emphasis was on convergence of the ergodic average (1/n) J2k =1 Pk ( x > ■)■ 

Table 1 



P 

= 2/3 

P 

= 0.9 

P 

Cc 

P 

Cc 

MT 

0.99994 

1119 

0.9967 

78.77 

MTB 

0.9991 

63.55 

0.9470 

2.764 

1.1 

0.9994 


0.9060 


MT* 

0.9965 

13 

0.9722 

7.313 

1.2 

0.9428 


0.6 


LT 

0.9428 


0.6 
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We compare the calculation of Meyn and Tweedie with estimates obtained 
by the coupling method and by our analysis. Since the Hastings-Metropolis 
algorithm is by construction reversible, we can use Theorem 1.2. Moreover, 
by Lemma 3.1 we can also apply Theorem 1.3. The continuous part of the 
transition probability P(x, •) has density 


p(x,y) 


-I'f 

{ 


exp 

exp 


{y-xf 


(y - x) 2 + y 2 - x" 


We use the same family of functions V(x) = e s \ x \ 
used in [9]. Following [9] we get, for x,s > 0, 


if M > \y\, 
if M < Inl¬ 
and sets C = [~d,d\ as 


A(x,s):= 


PV(x) 

V(x) 


= exp ( — ) [$(-s) - $(-x - s)] 


+ exp ^— 2sx^j [<J >(—x + s) — <J>(— 2x + s)] 
(x — s ) 2 ' 


+ ^r xp 

+ 7! exp 


$ 


x 2 — 6 xs + s 2 


s — x 


s — 3x 


V2 


1 '~ 2 


+ <f>(0) + <F(— 2x) - -= exp 

v 2 


—x 


, <F. ^ 

4 ) L \y/2 


+ <L 


(^) 


where denotes the standard normal distribution function. Then 


A = min \(x, s) = A (d, s), 

\x\>d 


and 


K = max PV(x) = PV(d) = e sd A(d, s ) 
|»|<d 


b = max PV (x) - XV(x) = PV{ 0) - AF(0) = A(0, s) - A. 

|x|<d 

The computed value for p depends on the choices of d and s. In Table 2 we 
give optimal values for d and s, and the corresponding value for 1 — p for five 
different methods of calculation. The first line is the calculation reported by 
Meyn and Tweedie, using a minorization condition with the measure v given 
by 

v(dx) = c • exp(— x 2 )lc(x) dx 
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Table 2 



d 

s 

1 ~P 

MT 

1.4 

4 x 1(T 5 

1.6 x 10” 8 

Theorem 1.1 

1 

0.13 

6.3 x 10” 7 

Coupling 

1.8 

1.1 

0.00068 

Theorem 1.2 

1 

0.07 

0.0091 

Theorem 1.3 

1.1 

0.16 

0.0253 


for a suitable normalizing constant c. In this case, v{C) = 1 and we have 
(3 = (3 = \/2exp(— d 2 )[<!>( y/2d) — 1/2]. For the purposes of comparison, the 
other four lines were calculated using the same measure. 

In Table 3, we used the measure v given by 


/ 3u(dx) = inf p(y, x ) dx 
yec 


1 / —(|s| + d) 2 \ 

71^ eXP V-2- ) dX ’ lf M- d ’ 

1 e -d\ x \-\x\ 2 dx if|x|>d. 

\Z2tt 1 _ 


Now (3 = 2[<h(2(f) — d>(d)] and (3 = (3 + \/2exp(d 2 /4)[l — <I>(3(i/v / 2)]- In the 
calculations for Theorems 1.1 and 1.2 we also used the extra information 
that 


K = V {C) + 



V(x) du{x) = d + ^3- exp 


4 


1 - $ 


3d — sY 

~w). 


in the formula for « 2 - 


Remark 8.1. For this particular example, it can be verified that the 
process {|X n | :n > 0} is a stochastically monotone Markov chain. The cou¬ 
pling result of Roberts and Tweedie ([16], Theorem 2.2) can be adapted 
to this situation. The calculation for p given by [16] is identical with the 
calculation for Theorem 1.3. 


Table 3 



d 

s 

1 - P 

Theorem 1.1 

1 

0.16 

1.7 x 10“ 6 

Coupling 

1.9 

1.1 

0.00187 

Theorem 1.2 

1 

0.11 

0.0135 

Theorem 1.3 

1.1 

0.22 

0.0333 
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8.3. Contracting normals. Here we consider the family of Markov chains 
with transition probability P(x, •) = N(6x, 1 — 9 2 ) for some parameter 9 E 
(—1,1). This family of examples occurs in [18] as one component of a two- 
component Gibbs sampler. The convergence of ergodic averages for this 
family was studied in [14] and [15]. Since the Markov chain is reversible 
with respect to its invariant probability IV(0,1), we can apply Theorem 1.2. 
We compare these results with the estimates obtained using the coupling 
method. 

We take V{x) = 1 + x 2 and C = [—c, c]. Then (A2) is satisfied with A = 
9 2 + 2(1 — 9 2 )/( 1 + c 2 ) and K = 2 + 9 2 {c 2 — 1). Also b = sup x£C PV(x) — 
XV(x) = 2(1 — 9 2 )c 2 /( 1 + c 2 ). To ensure A < 1, we require c > 1. For the 
minorization condition, we look for a measure v concentrated on C, so that 
P = /3. We choose $ and v so that 



for y E C. Integrating with respect to y gives 




where denotes the standard normal distribution function. 

For the coupling method, we have Ai = 9 2 + 4(1 — 9 2 )/(2 + c 2 ). To ensure 
Ai < 1, we require c > y/2. For the minorization condition in the coupling 
method there is no reason to restrict v to be supported on C, so we can 
adapt the calculation above by integrating y from —oo to oo to get 



So far, the calculations have depended on \9\ but not on the sign of 9. 
If 9 > 0, then P = Q 2 , where Q has parameter y/6, so we can apply the 
improved estimates of Theorem 1.3. However, if 9 < 0, and especially if 9 is 
close to —1, we can handle the almost periodicity of the chain by considering 
its binomial modification with transition kernel P = (/ + P)/ 2; see [20]. 
Regardless of the sign of 9, we can always apply Theorem 1.3 to the binomial 
modification. Replacing P by (1 + P)/2 with the same V, C and v means 
replacing A by (1 + A)/2, K by (1 + c 2 + K)/2 and (3 by fi/2. We let p denote 
the estimate obtained by applying Theorem 1.3 to P. Since 2 n steps of the 
binomial modification P correspond on average to n steps of the original 
chain P (see [20], Section 4), for purposes of comparison (see Table 4) we 
give the value of p 2 . 
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Table 4 


e 

Coupling 

Theorem 1.2 

Theorem 1.3 

9 positive 

Binomial mod. 

c 

P 

c 

P 

c 

P 

c 

~2 

P 

0.5 

2.1 

0.946 

1.5 

0.950 

1.5 

0.897 

1.5 

0.952 

0.75 

1.7 

0.9963 

1.2 

0.9958 

1.2 

0.9847 

1.2 

0.9924 

0.9 

1.5 

0.99998 

1.1 

0.99998 

1.1 

0.99948 

1.1 

0.99974 


8.4. Reflecting random walk, continued. Here we consider the same ran¬ 
dom walk as in Section 8.1 except that the boundary transition probabilities 
are changed. We redefine P(0, {0}) = e and P(0, {1}) = 1 — e for some e > 0. 
If £ > p, the Markov chain is stochastically monotone and the results of Lund 
and Tweedie [7] apply. Here we concentrate on the case e < p, which was 
studied by Roberts and Tweedie [15] and Fort [4], 

To apply Theorem 1.2, we take V(i) = ( p/q ) l//2 and C = {0} as earlier. 
Then A = 2 y/pq, K = e + (1 — e)\/p/q and (3 = e. If K < A + 2e [equivalently 
£ > (P ~ q)/{ 1 + y/qjp)], we get p = A = 2^/pq . If e < (p - q)/( 1 + y/q/p), 
then we take p = P _1 , where R solves 1 + 2 eR = _R 1+ I logi:!: )/( logA 1 ). 

For the coupling method, the size of the set C depends on the values of 
p and e. For the set C = {0,..., k}, the condition Ai < 1 will be satisfied if 
and only if e > 1 — {p/q) k ^ 2 {p— \Jpq)- I n particular, if z < 1 — (p/q)(p— \fpq), 
then C D {0,1,2} and there is no minorization condition for the time 1 
transition probabilities on C. Instead, as pointed out in [15], it is necessary 
to use a minorization condition for the m-step kernel. This program was 
recently carried out by Fort. In Table 5 we denote Fort’s estimates (taken 
from [4]) by pf and our estimates using Theorem 1.2 by p. 

In this example, we can also calculate the exact value for py. We have 

b(z ) = G(z, 0) = ez + (1 — e)zG(z, 1) 

= ez + ^[1 - (1 - 4 pqz 2 ) 1 ' 2 ] 

2 q 

for \z\ < 1/y/Apq, where the formula for G(z, 1) is taken from [3], Sec¬ 
tion XIV.4. The equation b(z) = 1 can now be solved explicitly for |z| < 
1/yJApq. One solution is z = 1. The only other possible solution is in the 
interval (— 1/^/Apq, —1) and exists as long as b(—l/y/4pq ) > 1 [equivalently 
as long as e < (p — q)/(l + y/q/p)]- If this condition is satisfied, the second 
solution is at r = —(p — e)/[pq + {p — e) 2 ]. By the argument in Kendall’s 
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theorem, we deduce 

[ Pq + (p - e) 2 -, „ P-Q 

PC = \ p-e ’ 1 1 + VW 

I 2^/pq, otherwise. 

By inspection of this formula we see pc > A. Since pc < pv < max(A, pc) 
from (2), we deduce that py = Pc in this example. 

As e —► 0, the chain becomes closer and closer to a period 2 chain. This 
is the setting where the binomial modification with kernel P = (/ + P)/2 
should converge significantly faster than the original chain: see [20]. Keeping 
the same function V{x) and C = {0}, and applying Theorem 1.3, we get the 
optimal result p = A = (1 + A)/2 = 1/2 + ^ /pq for all e > 0. For the purposes 
of comparison (see Table 6), we give the values of p 2 for the values of p which 
appeared in Table 5. 


APPENDIX 


Proof of Proposition 4.1. We write T n = a{X r : 0 < r < n}. For 
m> 0, we have 

Xm+1 tc\Fm) + X-^iViX^tx^clFm) < V{x m ) 


Table 5 




£ 



£ 



£ 


0.05 

0.25 

0.5 

0.05 

0.25 

0.5 

0.05 

0.25 

0.5 



p = 0.6 



r- 

o 

II 



oo 

o 

II 


pF 

0.9997 

0.9995 

0.9994 

0.9964 

0.9830 

0.9757 

0.9793 

0.9333 

0.9333 

P 

0.9909 

0.9798 

0.9798 

0.9830 

0.9165 

0.9165 

0.9759 

0.8796 

0.8000 

Pv 

0.9864 

0.9798 

0.9798 

0.9731 

0.9165 

0.9165 

0.9633 

0.8409 

0.8000 



p = 0.9 



p = 0.95 





PF 

0.9696 

0.8539 

0.7500 

0.9564 

0.7853 

0.5814 




P 

0.9687 

0.8470 

0.6817 

0.9645 

0.8289 

0.6667 




Pv 

0.9559 

0.7885 

0.6250 

0.9528 

0.7679 

0.5556 





Table 6 


P 

0.6 

0.7 

0.8 

0.9 

0.95 

P 2 

0.9799 

0.9186 

0.8100 

0.6400 

0.5154 
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on the set {X m ^ C}. Multiply by A take expectation and sum over 

m = 0 to n — 1 to obtain 

(31) \- n W{V{X n )t T>n ) + V x {\- T V{X T )t T < n ) < V{x) for all x $ C 
or, equivalently, 

\- n E x (V(X n )t T > n ) + W\X~ T V{X T )t T<n ) < V{x) for all x 0 C. 

(32) 

This implies that P x (r > n) < X n V(x) for x ^ C, which implies (i). 

The first assertion in (ii) is obtained by letting n —> oo in (31) and the 
second assertion follows from the first via the identity 

G(r,x) = rP(x,C) + r P(x,dy)G(r,y). 

Js\C 

For the calculations to prove (iii) and (iv) it is convenient to define the 
function 


J(r,x) = E x (r T V{X T )). 

The functions H and J satisfy the identities 

(33) H(r,x) =rPV{x) +r [ P(x,dy)H(r,y) 

Js\C 

and 

(34) J(r,x) = r f P(x,dy)V(y) + r [ P(x,dy)J(r,y). 

JC JS\C 

For 0 < r < A -1 , multiply (32) by A n r n and sum over n = 1 to oo. We obtain 

(35) H(r,x)-\ - —t— J(r, x) <— ^—V(x) for all x 4 C, 

1 — A r 1 — Ar 

which gives the first part of (iii). For x G C, we use the inequality (35) in the 
right-hand side of the identity (33) along with the identity (34) to obtain 

Ar 2 r 


H (r, x) < rPV(x) + 


1 - Ar Js\c 


P{x,dy)[V(y) - J(r,y )] 


r A?’ 

PV(x) — - - —J(r,x) 


< 


1 — Ar 
r(K — Ar) 


1 — Ar 


1 — Ar 

This completes (iii). If we replace A n r n by A n (r n — 1) in the derivation of 
(35) we obtain instead 

Ar A 

H(r,x ) — H(l,x) -\ - —J(r,x) -- J{l,x) 


1 — Ar 


1 — A 


— 11 — v(x) 
-(1-A)(1 —rA) U 


(36) 
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for x ^ C and 1 < r < A 1 . Using (33), (36) and (34), we get 
H(r,x ) — H(l,x) 


= r [ P(x,dy)[H(r,y)-H(l,y)] 

Js\c 

i { 1 - r m-\r)Is\c P(X ' dy)V{,,) 

-r P(x,dy) 

Js\c 


is\c 
A r{r — 1) 

= (1 — A)(l — rA) 
and (iv) follows easily. □ 


T^ J{T ’ y) + T=\ J{1 ’ y) 


PV{x) - Ar 




Proof of Proposition 4.2. For z £ C, write 

G(z, x) = E *(z r ), H(z, x) = E x (j2 * n V( X n) 

V n=l J 

and 

H g {z,x) = E x (j2z n g(X n )\. 


k n =1 


Let u(z) = Y^=o u nZ n be the generating function for the sequence u n . Sup¬ 
pose \z\ < 1. The first-entrance-last-exit decomposition ([8], equation (13.46)) 
yields 


P n g(x)z n = H g (z,x) + G(z,x)u(z)H g (z,a). 

71 = 1 

Furthermore, [8], equation (13.50), gives 

J gdir = ir(C)H g (l,a). 

Together, for \z\ < 1 we have 

jt(p n g(x)- [gdv 


n =1 


(37) 


= Hg(z,x) + G(z,x)u(z)H g (z, a) - H g {\, a) 

vr(C) 


= H g (z,x) + G(z,x) 


u(z) - 


1 -z 


iriSTT t ^G(Z,X)-1 

-TT{C)H g (z,a) -—--tt(C) 


H g (z,a ) 

Hg(z,a) - zH g (l, a) 


z- 1 
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Now 


H g (z,a) ~ zH g (l,a) 
z-1 

= E “fj2g(X n )(z + ---+Z n ~ 1 ) 

\n=1 


<E a ^V(X n )(\zl + --- + jzr 1 )j 

< ff(r,a)-rff(l,a) 
r — 1 

if \z\ < r and r > 1, and a similar estimate holds for | (G(z,x) — 1 )/(z — 1)|. 
Also 7r(C) = linin^oo P“(A n € C) = lim^oo u n =u 00 , so 

, , vr(C) “ . n 

U{Z ) - —— = 2_^{u n - Uoojz 
1 2 n=0 

and the result now follows easily from (37). □ 


Proof of Proposition 4.3. Notice that the invariant probability mea¬ 
sure 7r for {X n :n> 0} is the S marginal of the stationary probability 7f, say, 
for the split chain, so that / gdn = / gchr. The argument used in the proof 
of Proposition 4.2 gives expressions similar to (37) for ’*(g(X n )) — 

/ gchr)z n for z = 0,1. Multiplying the i = 0 expression by (1 — f3)tc{x) 
and the i = 1 expression by f3tc(x) and adding gives an expression for 
\(P n 9{ x ) ~ f gdir)z n . The remainder of the proof is exactly as in the 
proof of Proposition 4.2. 

□ 


To prove Proposition 4.4 we need some intermediate results. Define 

G(w) = E x '’V), 

H(r,x,i) = E x ’ i (j2r n V(X n )]. 


k 71=1 


In addition to G(r) defined in Section 4.2, we define 

H ( r ) = sup{il (r, i,0):iG C}, 

H (r, 1) = sup{i7 (r, x, 0) — rH( 1, x,0 ):x €C}. 

We need to consider the following functions which are defined in terms of 
the split chain and the original stopping time r = inf{n > 1: X n G C}. 
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(38) 

(39) 
and 

(40) 


Lemma A.l. Assume conditions (Al) and (A2). Then 

/ 3G(r , x, i) 


G(r, x, i ) < 


H ( r , x,i) < H (r, x, i) + 


H(r,x,i ) — rH(l,x,i) 


i-(i -p)G( r y 

(1 — P)H(r)G(r,x,i ) 


l-(l-/3)G(r) 


< H(r,x,i ) — rH(l,x,i) + 


(l- f3)H(r,l)G{r,x,i) 

1 — (1 — P)G(r ) 


+ 


(1 - (1) 
l-(l-0)G(r) 


[G(r, x, *)-!] +[G(r)-1] 


P 


for all r > 1 such that (1 — /3)G(r) < 1 and r < A 1 . 


Proof. Define the sequence of stopping times to = 0 and r* : = Tfc_i + 
r o 0(rfc_i) for A; > 1 [where 9{n) denotes the natural time n shift]. Define 
the random variable K = inf {A; > 1: Y Tk = 1}, so that T = tk- Then 

E xA (j2r n V(X n )) 


< n=l 


K r k 


= E X ’ 1 E E ^E(X„ 

\ k =1 n=T/ c _ 1 +l 


oo / Tk 

zx.i 1 


= E E "1 E r n V(X n ),K>k 

k =1 \n=Tfc_i+l ) 


T k 


= H r (x,i) + J2 E"’M E r n V(X n ),K>k 

\n=T k _ 1 +1 y 


k =2 


By conditioning on (/(r^-i), where f/(n) = V JPJA], we get 

E x ’M ^ r n V(X n ),K>k\ < {l-P)H(r)E x '\r Tk -^K> k~l) 

\n=T k _ 1 +1 / 


E x ’*(r Tfc ,A: > jfe) < (1 -p)G(r)B x,i (r Tk -\K > k - 1) 


and 
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for k> 2. Together we obtain by induction 


< n =1 


< H r (x,i) + (1 - P)H(r) J2 E X '\r Tk ,K>k) 

k =1 

oo 

< H r (x,i) + H(r)J2( 1 - j9) fc G(r-) fc - 1 G(r,x,i), 

k=\ 

giving (39). To prove (38), note first that 

OO _ OO 

E x '\r T ) = E x ’\r Tk , K = k) = (3 E x ’\r Tk ,K>k), 

k =1 k =1 

and the remainder of the proof is the special case of the proof above with 
V replaced by lc- To prove (40), we note that for k> 2, 

W jr (r n -r)V(X n ),K>k\ 

\n=rfe_i+l / 

< (1 - 0)H(r, 1 )E x '\r Tk - 1 ,K>k- 1) 

+ (1 - /3)H(l)rE x,i (r^- 1 - 1 , K > k - 1 ) 

and P X, ‘(K > k — 1) = (1 — /3) k ~ 2 . Then the rest of the proof is essentially 
the same as for (39). □ 

Lemma A.2. Assume conditions (Al) and (A2), and let a\ and «2 be 
given by (18) and (20) of Proposition 4.4. Then for 1 < r < A^ 1 , 

G(r) < r ai , 

G(r, a, 1) < r" 2 . 

Proof. For X £ C we have 
(l-P)G(X-\x,0) 


= A 


-l 


P(x,C)~ (3v{C) + G(X~ ,y) [P(x, dy ) - 0v[dy)\ 

JS\C 


< A" 1 P(x, C) - 0 V {C) + / P(y) [P(x, dy) - /Mdy)] 

ls\c 


< X 


-l 


PV{x)-0 / Fdi/ 


< A _1 (iL - /3). 
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Therefore, for 1 < r < A 


-l 


G(r) = sup G(r, x,0) < sup(G(A ,cc,0)) 

xGC xGC 

'A -1 (if — (3)\ ( lo g r )/( lo g A_1 ) 

< i-/9 ; 


(log r-)/(log A 


(This estimate on G(r) appears as Theorem 2.2 in [15].) The minorization 
condition implies (3 f s V dv < P V (x) < K for x £ C and so f s V dv < K/ f3. 
We have 


G( A 1 , a, 1) = A 1 z^(C') + A 1 / G(A 1 ,y)u(dy) 

Js\C 

< / V{y)v(dy) 

Js 


< 


K 

A/3 


and so, for 1 < r < A 


-l 


/ k \ ( lo s r )/( lo g A x ) 

and the proof is complete. □ 

Proof of Proposition 4.4 and Remark 4.1. It is clear from the 
proof of Lemma A.2 that its assertions remain valid when 0/2 is chosen 
according to Remark 4.1. The inequality (19) is part of the statement of 
Lemma A.2, and inequalities (21) and (22) are immediate consequences of 
Lemmas A.l and A.2. The result (23) uses the estimate 


(1 — /3)H(r) < sup H(r,x) — /3r < 

x£C 


r[K — rX — (3(1 — rA)] 


1 — rA 


from Lemma A.l. To obtain (24), notice first that 
(1 - P)H(r)G(r,a,l) 


H(r,a, 1) + 


1 —(1 -P)G(r) 


1 


G(r, a, 1) sup H ( r , x) + H(r, a, 1) 
x€C 


1 - (1 - P)G(r) L 

<-— G(r, a, 1) sup H(r, x ). 

“ 1 — (1 — (3)G(r) xec 


1 — supG(r, x) 
xGC 
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The proof of (25) is similar, using the inequality 

(1 ~ P)H(r,l)G(r,a,l) 


H(r,a, 1) - rH(l,a, 1) + ■ 


1 —(1 -P)G(r) 


< 


1 


1 — (1 — p)G{r) x<eC 


□ 
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